Comparison Of HTML Parsers
   HOME

TheInfoList



OR:

HTML parsers are software for automated
Hypertext Markup Language The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
(HTML)
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
. They have two main purposes: * HTML traversal: offer an interface for programmers to easily access and modify the "HTML string code". Canonical example: DOM parsers. * HTML clean: to fix invalid HTML and to improve the layout and indent style of the resulting markup. Canonical example:
HTML Tidy HTML Tidy is a console application for correcting invalid HyperText Markup Language (HTML), detecting potential web accessibility errors, and for improving the layout and indent style of the resulting markup. It is also a cross-platform library ...
. : * Latest release (of significant changes) date. : ** ''sanitize'' (generating standard-compatible web-page, reduce spam, etc.) and ''clean'' (strip out surplus presentational tags, remove XSS code, etc.) HTML code. : *** Updates HTML4.X to XHTML or to HTML5, converting deprecated tags (ex. CENTER) to valid ones (ex. DIV with style="text-align:center;").


References

{{Reflist HTML parsers
HTML parsers The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...